perm filename NOTES.PDQ[LSP,JRA] blob sn#316161 filedate 1977-11-07 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	.SS(Syntax-directed I/O,SDIO,P105:)
C00017 ENDMK
C⊗;
.SS(Syntax-directed I/O,SDIO,P105:)
.FP
.BEGIN TURN ON "←";
It is frequently quite  convenient to enter input and
receive output in  something other than list notation.
Recall our diagram on {yon(P46)}.
We wish to mechanize   the
encoding of the input and the decoding of the output.  

Consider for example, the problem of simplification of algebraic expressions.
Many rather sophisticated simplifiers have been written
(⊗↑[Hea#68]↑, ⊗↑[MAC#74]↑). Assume that we have
one named %3simplify%* which expects list   input and gives list   output. Thus
for example:

.BEGIN SELECT 3;TABIT1(36)
.BOXA
(3+4)*x + x =%4I %*=> (PLUS (TIMES (PLUS 3 4) X) X) 
\=%4simplify%*=> (TIMES 8 X) =%4O %*=> 8*x 
.BOXB
.END
.FP
We would like transformations I and O done automatically. M-expr notation is
a candidate for such a task. Then we could write algorithms in M-expr
notation and have them executed by %3eval%1.

.BEGIN SELECT 3;TABIT1(36);
.BOXA
cons[A;B] =%4I %*=> (CONS (QUOTE A) (QUOTE B)) 
\=%4eval%*=> (A . B) =%4O %*=> (A . B)
.BOXB
.END
.FP
Transformation O is the identity in this case.

Frequently, the input and output transformations can be generated automatically.
We describe one such program, called SDIO for Syntax-Directed Input-Output
(⊗↑[Qua#68]↑.)
It was the forerunner of the  MLISP2 project; see  ⊗↑[Mli#73]↑.
 We will assume that the input and output
syntax is specified in BNF. With each BNF equation we will associate 
semantics describing the S-expr representation. The input transformation (parser) 
will
use this information to build the representation; and the output transformation
(unparser) will map the internal representation back.

Syntax directed I/O  is more than  a cosmetic.
 Assume we wish to
represent a structure as a particularly horrible list structure. We can give
augmented BNF equations specifying the external representation and the translation
to the underlying representation. Clearly when outputting these structures 
we do not want to see the internal representation. This can be particularly
annoying when we are debugging;  we wish  to concentrate on the misbehavior
of the 
algorithm; we do not want to be distracted by incomprehensible output.
Syntax directed output, or unparsing, can aid significantly.

The easiest introduction to SDIO is to examine an example. 
Consider the  proposed
simplification task above. The "natural" input syntax can be described in BNF.
We have given closely related syntax equations
on {yon(P68)}. We  will display  a few equations augmented by SDIO semantics. 
.GROUP;
.FP
For example:

.BEGIN TABIT2(16,43);
.BOXA
   <EXP>\::= <EXP> + <TERM>\=>(PLUS EXP TERM)
\::= <TERM>\=>*
.PT2
   <TERM>\::= <NUMBER>\=>*
.BOXB
.END
.APART
.FP
To the input parser the first BNF equation means: whenever the right hand side 
is recognized, reduce that occurrence to the left hand side and associate with it
the list consisting of the atom PLUS, the S-expr associated with the occurrence of
<EXP>, and the S-expr associated with the occurrrence of <TERM>.
The second equation means reduce <TERM> to <EXP> associating whatever S-expr
is attached to <TERM> with that occurrence of <EXP>. In the third equation
we assume that <NUMBER> is a syntactic type recognized by the scanner, and return
that number as the semantic value.
For example, if such a parser were given 2+3+44 it should return the list
%3(PLUS (PLUS 2 3) 44)%*.

The unparser uses these equations in the inverse manner. It will see a S-expr
and will attempt to match that to the description of the semantics, outputting
an instance of the BNF if successful.

The SDIO program will take such an augmented set of BNF equations and
generate a parser and an unparser for the language.
This project involves writing such a SDIO program. We describe a basic
SDIO program and suggest extensions and improvements.

The best way to describe the format of  SDIO input is to give an SDIO
description.
.BOXA
.BEGIN TABIT2(19,46);GROUP;

<RULES>\::= END\=>NIL
\::=<RULE><RULES>\=>(RULE . RULES)
.PT2
<RULE>\::= <LFPT><RTLST>\=>(LFPT RTLST)
.PT2
<RTLST>\::= ::=<RTPT><SEXPR><RTLST>
\\=>((RTPT SEXPR) . RTLST)
\::= %9ε%*\=>NIL
.PT2
<LFPT>\::= <<ID>>\=>*
.PT2
<RTPT>\::= "=>\=>NIL
\::= <RPELEM><RTPT>
\\=>(RPELEM . RTPT)
.PT2
<RPELEM>\::= <<ID>>\=>*
\::= <ID>\=>(SPWD ID)
\::= ""<CHAR>\=>(QCH CHAR)
\::= <CHAR>\=>(CH CHAR)
.PT2
<SEXPR>\::= <ATOM>\=>*
\::= (<SEXPRLIST>)\=>*
.PT2
<SEXPRLIST>\::= <ATOM>	\=>*
\::= <SEXPR> <SEXPRLIST>
\\=>(SEXPR . SEXPRLIST)
\::= %9ε%*\=>NIL
.BOXB
END
.END
.FP
The expressions %3(SPWD ID), (QCH CHAR),%1 and %3(CH CHAR)%1  are S-expr
representations of calls on rountines to process special or reserved words,
quoted characters or special characters, respectively.


The input  to SDIO is a sequence of augmented BNF equations terminated with END.
What the SDIO program sees is a S-expr representation of these equations. 
The sample equations for <EXP> above would pass the following to the SDIO program:

.BEGIN TABIT3(10,15,25);GROUP;

\(\(EXP (\((EXP (CH +) TERM) (PLUS EXP TERM)) 
\\\((TERM) NIL)))
\\(TERM (\((NUMBER) NIL))) )
.END
.boxb
.FP
The SDIO program generates   the parser and unparser.

The elements of the BNF equations in SDIO are rather standard: syntactic variables,
which are
identifiers bracketed by "<" and ">"; 
and special words, which are identifiers; and
special characters, which are preceeded by  " if they conflict with the special
characters of the BNF.

The elements of the semantics are: unbracketed syntactic variables occurring
in the RHS of the associated BNF equation; other identifiers, taken as constants;
NIL, the LISP atom; notation for %3cons%*-ing, %3(###.###)%*; notation for
making a list, %3(e%41%*#...#e%4n%*)%1; the character *, described above.
.BOXA
.GROUP
.CENT(Project)
.PT2
.FP
Write such a SDIO program. You should consult local LISP documentation 
when building the basic I/O routines like <NUMBER>, <CHAR>, or <ID>.
.BOXA
.APART
.GROUP;
.CENT(First Extension)
.PT2
.FP
You may have noticed already that the basic SDIO program fails to
distinguish two occurrences of the same syntactic variable on the RHS of an
equation. Thus an equation like:

.BEGIN TABIT1(11);GROUP;
.BOXA
<ZIP>\::= <ZAP> <FOO> <ZAP>  must be replaced by the pair:
.PT2
<ZIP>\::= <ZAP> <FOO> <ZAP1>
.PT2
<ZAP1>\::=<ZAP>
.BOXB
.END
.APART
.FP
This trick is called ⊗→stratification↔←. It is a syntactic trick, adding
nothing to the semantics.

Add notation to the semantics of your SDIO program to handle RHS with 
multiple occurrences of syntactic variables. Modify your parser generators
accordingly.
.BOXA
.GROUP;
.CENT(Second Extension)
.PT2
.FP
Besides building a S-expr representation of the input, it is frequently
desirable to generate other information during the input parse. Lists of
occurrences of operators or other tables are commonly needed. The additional
information could be discovered by examination of the completed parse tree,
but that requires reexamination of the tree. It is much more efficient to
do as much as possible on a single pass.
.APART

Introduce notation which will allow execution of arbitrary LISP code
as the parse progresses. That code should be able to manipulate
any of the semantic properties associated with the syntactic variables appearing
in the RHS of the associated  syntax equation.
.BOXA
.GROUP;
.CENT(Third extension)
.PT2
.FP
While it is obviously advantageous to produce output in the language
described by the BNF equations rather than the S-expr form, formatting
of the output can be equally beneficial.
We should like to be able to specify formatting information in SDIO
such that spacing and line-length are controlled.
.APART
One proposal is to embed spacing and line-feed control characters in the
BNF equations. The spacing character is "→" and the line-feed is "↓".
The "↑" sets the indentation point for the string on its right; and the "→"
followed by a digit says space over than number of spaces from the
last indentation point if the remaining  space on the line is not sufficient to
contain all text specified by the remaining RHS of the equation.
"0→", meaning go to the indentation point can be written "→".
.FP
.GROUP;
For example consider the following:
.BEGIN TABIT2(13,32);GROUP;
.BOXA
\<EXPR>\::= <ID>( ↓ <EXPR_LIST>)
\\::= <ID>
.PT2
\<EXPR_LIST>\::= ↑<EXPR> , →<EXPR_LIST>
\\::= ↑<EXPR>
.BOXB
.END
.APART
.GROUP;
.FP
These equations, when used to drive an unparser, could give:
.BEGIN TABIT2(10,18);SELECT 3;
.BOXA
\mumf(\a,
\\foobaz(garp(b)),
\\bletch(a,b,c),
\\d)
.END
.FP
as the formatted version of:
.BEGIN CENTER; SELECT 3;
.BOXA
mumf(a,foobaz(garp(b)),bletch(a,b,c),d)
.BOXB
.END

Extend SDIO to handle formatting of output.
.APART
.END